Maps of Uncertinaty

The techniques in this section allow us to create maps of where our uncertinaty may be higher or lower. The most straight forward is a map that shows the standard deviation of our output results after a series of MC runs. The approach is to:

Create a loop that creates a model and a model prediction with a random component. The random component could be cross-validation, noise injection, sensitivity testing, or some combination of all of these.
After each model is created, do a prediction and compute the residuals.
Save the residuals for further analysis.
After the runs are completed, compute the standard deviation of the residuals.
Use the standard deviation of the residuals to create an uncertinaty map.

This process can be used to create a wide variety of uncertianty maps.

Basic Method

Note that this section describes portions of the code. The full code is in the next section

The solution below will work for creating uncertianty maps in most situations until we use cross-validation and are using the original data for the prediction. The next section shows an approach for cross validation.

This approach uses a matrix to capture all the predicted values. Each time a model run is completed, the predicted values are added as a column to the matrix. After all runs are finished, the standard deviation of each row can be computed to create a vector that can be added back into a dataframe and then used to create an uncertianty map. This provides a map of the standard deviation of the predictors.

The first step is to include the 'matrixStats' library which includes a number of functions for matrixes including one to compute the standard deviation of the rows in a matrix.

# matrixStats includes a function to compute the std dev of a row
require(matrixStats)

Next, create an empty matrix that has the same number of rows as the dataframe and the number of columns the same as the number of times you will repeat the process (e.g. Number of folks for k-fold validation).

# create an empty matrix 
PredictionMatrix=matrix(, nrow = NumRowsInDataFrame, ncol = 1)

Then, inside your loop, compute the residuals vector for all rows in your dataframe and add it to your residuals matrix.

# go through the residuals adding them into the residuals matrix 
i=1
NumRows=length(Residuals)
while (i<=NumRows)
{
	OriginalIndex=OriginalIndexes[i] # get the original index for this data point
	
	Value=unname(Residuals[i]) # get the Residual value without the name
	
	ResidualsMatrix[OriginalIndex,1]=Value # set the value into the residuals matrix
	
	i=i+1
}

Finally, after the loop, compute the standard deviations for the rows in the matrix. This will result in a vector that can be added to the dataframe and then used to create a map. Note that matrixStats also includes functions for computing means and other matrix values that can be used to evalute models.

# get the standard deviation for each row in the predicted values matrix 
PredictionStdDevs=rowSds(PredictionMatrix) 


# get the mean for each row in the predicted values matrix
PredictionMeans=rowMeans(PredictionMatrix)

Creating Uncertianty Maps With A Dataset that Covers the Entire Study Area.

There are times when we will have a continous response varaible and we are preidcting values on the same data set that we used to create the mdoel. These are rare but would include when we have a biomass or land cover data set, created and model of biomass, and then ran it in future senarios.

The approach below uses k-fold cross-validation to find the residuals and then adds the residuals back into our original data so we can use them to create an uncertinaty map. Using k-fold cross-validation gives us one value for each point in our sample. This function could be changed to use a 70/30 split or to do mulitple k-fold cross-validations or other approachs. The one problem I ran into with 70/30 split is that you can end up with points that do not have a residual or just have one making caclulation of a std dev of the residuals impossible.

# matrixStats includes a function to compute the std dev of a row
require(matrixStats) 

# load the file with the field measurements for DougFir height
TheData = read.csv("C:\\Users\\jim\\Desktop\\GSP 570\\Lab 07\\DougFir_All_CA_8km.csv")

# Remove any NA (null) values, the data should already be loaded
TheData=na.omit(TheData)

# Setup the number of loops and number of folds for each loop
NumFolds=10 # number of folds (loops)
NumLoops=2 # number of loops (repeat k-fold validation)

# create an empty matrix for the residuals
# Each row is for one row in our data, each column is for the residuals
# from one run through the loop below.
OutputMatrix=matrix(, nrow = nrow(TheData), ncol = NumLoops) 

# add a column with ids for the original rows
TheData$ID=seq(1:nrow(TheData)) 

# Repeat entire k-fold validcation
LoopCount=1
while (LoopCount<=NumLoops)
{
  # create the folds (buckets) for the data
  TheData2 <- TheData[sample(nrow(TheData)),] # shuffle the data
  TheFolds <- cut(seq(1,nrow(TheData2)),breaks=NumFolds,labels=FALSE) # divide the data into buckets

  FoldCount=1
  while (FoldCount<=NumFolds)
  {
    # split the data into test and training
    testIndexes <- which(TheFolds==FoldCount) # get the indexes to the next fold in the data
    testData <- TheData2[testIndexes, ] # used the testindexes to sub-sample the data into a test dataset
    trainData <- TheData2[-testIndexes, ] # use the training indexes to sub-sample the data into a training dataset
    
    # Create a GAM based on the training data
    attach(trainData)
    TheModel=gam(Height~s(AnnualPrecip),data=trainData,gamma=1.4) 
    detach("trainData")
    
    # create the preidction using the test data
    attach(testData)
    TestPrediction=predict.gam(TheModel,testData,type='response')
    detach("testData")
    
    # compute the output values (default to the prediction)
    OutputValues=TestPrediction-TheData$MaxHt
    
    # get the original indexes for the test data to update the residuals
    OriginalIndexes=TheData2$ID[testIndexes]
    
    # go through the residuals adding them into the residuals matrix 
    i=1
    NumRows=length(OutputValues)
    while (i<=NumRows)
    {
      OriginalIndex=OriginalIndexes[i] # get the original index for this data point
      
      Value=unname(OutputValues[i]) # get the Residual value
      
      OutputMatrix[OriginalIndex,LoopCount]=Value # set the value into the residuals matrix
      
      i=i+1
    }
    FoldCount=FoldCount+1
  }
  LoopCount=LoopCount+1
}
OutputStdDev=rowSds(OutputMatrix)
TheData$SD=OutputStdDev

Creating an Uncertainty Map with Field Data

The following is a modification of the code above to use cross validation on a set of field data (points) to create an uncertinaty map.

This was created by Zane Kuser

#####################################################
# Uncertainty Map

# computing the standard deviation of the model's predictions at each pixel for the full study site
# based on NumFolds number of predictions at that pixel
# where each prediction is made by a model trained on NumFolds-1 folds (subsets) of the labeled training data
# with a different fold held out from the training data on each iteration

library(rpart) # modeling library (decision trees in this case)
library(matrixStats) # includes a function to compute the std dev of a row

# load labeled training data
LaiPath = r"(C:\Users\jg2345\Desktop\GSP 570\Zane's Code\lai_points.csv)" # measured response values (labels)
PansharpPath = r"(C:\Users\jg2345\Desktop\GSP 570\Zane's Code\landsat_pansharpened.csv)" # covariates sampled at labeled points (points where response values were measured)

LaiPoints = read.csv(LaiPath) # read response values (labels)
TheData = read.csv(PansharpPath) # create dataframe of covariates
TheData$LAI = LaiPoints$LAI # add labels (measured response values) to covariate dataframe

# load the covariate values for the full study site (unlabeled data)
NewDataPath = r"(C:\Users\jg2345\Desktop\GSP 570\Zane's Code\SchatzPanVals.csv)"
NewData = read.csv(NewDataPath)

# Remove any 0 or NA (null) values
NewData[NewData==0] = NA
NewData=na.omit(NewData)

# Setup the number of folds
NumFolds=10 # number of folds (loops)

# create an empty matrix for the predictions
# Each row is for one row in our data, each column is for the predictions
# from one run through the loop below.
OutputMatrix=matrix(, nrow = nrow(NewData), ncol = NumFolds)

# create the folds (buckets) for the data
ShuffledData = TheData[sample(nrow(TheData)),] # shuffle the labeled data
TheFolds = cut(seq(1,nrow(ShuffledData)),breaks=NumFolds,labels=FALSE) # divide the data into buckets

# Repeat k-fold validation
FoldCount=1
while (FoldCount<=NumFolds)
{
  # split the data into test and training
  holdoutIndexes = which(TheFolds==FoldCount) # get the indexes to the next fold in the data
  holdoutData = ShuffledData[holdoutIndexes, ] # use the holdout indexes to sub-sample the data into a holdout dataset (one fold)
  trainData = ShuffledData[-holdoutIndexes, ] # use the training indexes to sub-sample the data into a training dataset (the rest of the folds)
  
  # Create a model based on the sub-sampled training data
  TheModel = rpart(LAI~band2+band5, data = trainData, control = rpart.control(minsplit = 4, cp = 0.01))
  
  # create the prediction on the full study site
  ThePrediction = predict(TheModel,newdata = NewData)
  
  # add the predictions to the prediction matrix
  OutputMatrix[,FoldCount] = ThePrediction
  
  FoldCount = FoldCount + 1
}

OutputStdDev=rowSds(OutputMatrix) # compute standard deviation of NumFolds number of predictions
NewData$SD=OutputStdDev # add standard deviation of predictions to full study site covariate dataframe

# write full study site covariate dataframe (with SD of predictions now included) to csv for conversion to raster and mapping
OutputPath = r"(C:\Users\jg2345\Desktop\GSP 570\Zane's Code\SDofPredictions.csv)"
write.csv(NewData, OutputPath, row.names = FALSE)

#####################################################

R for Spatial Statistics

Maps of Uncertinaty

Basic Method

Creating Uncertianty Maps With A Dataset that Covers the Entire Study Area.

Creating an Uncertainty Map with Field Data